Language-independent Techniques for Automated Text Summarization

نویسنده

  • Mark Last
چکیده

Text summarization is the process of distilling the most important information from source/sources to produce an abridged version for a particular user/users and task/tasks. Automatically generated summaries can significantly reduce the information overload on intelligence analysts in their daily work. Moreover, automated text summarization can be utilized for automated classification and filtering of text documents, information search over the Internet, content recommendation systems, online social networks, etc. The increasing trend of cross-border globalization accompanied by the growing multi-linguality of the Internet requires text summarization techniques to work equally well on multiple languages. However, only some of the automated summarization methods proposed in the literature can be defined as “multi-lingual" or “language-independent," as they are not based on any morphological analysis of the summarized text. In this chapter, we present a novel approach called MUSE (MUltilingual Sentence Extractor) to “language-independent" extractive summarization, which represents the summary as a collection of the most informative fragments of the summarized document without any language-specific text analysis. We use a Genetic Algorithm to find the best linear combination of 31 sentence scoring metrics based on vector and graph representations of text documents. Our summarization methodology is evaluated on two monolingual corpora of English and Hebrew documents, and, in addition, on a bilingual collection of English and Hebrew documents. The results are compared to 15 statistical sentence scoring methods for extractive single-document summarization found in the literature and to several stateof-the-art summarization tools. These bilingual experiments show that the MUSE methodology significantly outperforms the existing approaches and tools in both languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

Towards multi-lingual summarization: A comparative analysis of sentence extraction methods on English and Hebrew corpora

The trend toward the growing multilinguality of the Internet requires text summarization techniques that work equally well in multiple languages. Only some of the automated summarization methods proposed in the literature, however, can be defined as “languageindependent”, as they are not based on any morphological analysis of the summarized text. In this paper, we perform an in-depth comparativ...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

A System for Generating Cloze Test Items from Russian-Language Text

This paper studies the problem of automated educational test generation. We describe a procedure for generating cloze test items from Russian-language text, which consists of three steps: sentence splitting, sentence filtering, and question generation. The sentence filtering issue is discussed as an application of automatic summarization techniques. We describe a simple experimental system whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010